Good vs. Bad Models: A Perplexity Comparison

A "good" model, trained on relevant data, assigns high probability to a logical sentence, resulting in low perplexity. A "bad" model, trained on poor or insufficient data, is more "surprised" by the same sentence, leading to high perplexity.

Good Model (Low PPL)

Training Corpus

Learned Bigram Probabilities

Bad Model (High PPL)

Training Corpus

Learned Bigram Probabilities

Evaluation on a Test Sentence